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BUS TRANSACTION REORDERING IN A In the case of data tenure, during address arbitration, the 

COMPUTER SYSTEM HAVING CPU arbitrates for mastership of the data bus. After the CPU 

U NORDERED SLAVES is the bus master, during the data transfer phase, it samples 

(P^ELD OF THE INVENTION ,he , da,a b "f. f ° r ^ °P era f io " s or driv f s lhe data J™ ?™ 

$\f 5 wnte operations. Data termination signals occur in the data 

The present invention relates to computer architecture, in termination phase. Data termination signals are required 

particular to computer architecture for small computer sys- after each data beat in a data transfer. In a single-beat- 

tems such as personal computers. transaction, the data termination signals also indicates the 

STATE OF THE ART end °^ tne tenure > while in burst accesses, the data termi- 

m - n „ , . , , 10 nation signals apply to individual beats and indicate the end 

The PowerPC computer arch.tecture, co-developed by of ^ , enure on , after , he fina , d>u ^ 

Apple Computer, represents a departure from prior- . , c t , , 
generation small computer architectures. PowerPC J Address-only transfers use ;only the address bus, with no 
machines currently sold by Apple are based largely on the data . transfer mv ° lved : 71115 feature 1S Particularly useful in 
Motorola MPC601 RISC microprocessor. Other related „ mula-master and multiprocessor environments .where exter- 
processors, including the MPC 604, MPC 603, MPC 603e, 15 ^1 control of on-chip primary caches and TLB (translation 
and MPC 602 are currently available and additional related bu * er > entnes \* M ? Wc " Additionally, the 
processor including the MPC 620 will be readily available in MPC60x Py 0Vldes a \ cX ? ^ability that supports an efficient 
the future. The MPC60x permits separate address bus ten- sn00 P m S ' P™ toco1 for ^ ms With muhl P 1 t e raeraor y 
ures and data bus tenures, where tenure is denned as the s y stems 0 ncludl °g caches ) that m ust remain coherent, 
period of bus mastership. In other words, rather than con- Pipelining and split-bus transactions, while they do not 
sidering the system bus as an indivisible resource and inherently reduce memory latency, can greatly improve 
arbitrating for access to the entire bus, the address and data effective bus-memory throughput. The MPC60x bus proto- 
buses are considered as separate resources, and arbitration col does not constrain the maximum number of levels of 
for access to these two buses may be performed indepen- pipelining that can occur on the bus between multiple 
dently. A transaction, or complete exchange between two masters. In a system in which multiple devices must corn- 
bus devices, is minimally comprised of an address tenure; P ele for tbe system bus, external arbitration is required. The 
one or more data tenures may also be involved in an external arbiter must control the pipeline depth and synchro- 
exchange. There are two kinds of transactions: address/data nization between masters and slaves, 
and address-only. ^ In a traditional pipelined implementation, data bus tenures 

A tenure consists of three phases: arbitration, transfer, and are kept in strict order with respect to address tenures, 
termination. During termination, a signal occurs that marks However, external hardware can further decouple the 
the end of the tenure. The same signal is used to acknowl- address and data buses, allowing the data tenures to occur 
edge the transfer of an address or data beat. A beat corre- out of order with respect to the address tenures. Second- 
sponds generally to a particular state of the address bus or 35 generation PowerPC computers include computers whose 
the data bus. Transfers include both single -be at transfers, in architecture was especially designed for high performance 
which a single piece of data is transferred, and burst data and that incorporated such hardware. This architecture sup- 
transfers, in which a burst of four data beats is transferred. ports true split-bus operation with ordered slaves and 

Referring more particularly to FIG. 1, note that the ordered masters. "Ordered" means each master and each 

address and data tenures are distinct from one another and 40 slave has its own independent FIFO structure supporting 

that both consist of three phases— arbitration, transfer, and "ordered" service to transactions posted to it. If a slave 

termination. FIG. 1 shows a data transfer that consists of a receives three transactions A, B, and C, then it will respond 

single-beat transfer (up to 64 bits). In a four-beat burst t0 A first, B second, and C third. If a master performs 

transfer, by contrast, data termination signals are required transactions D, E, and F, then it expects servicing of those 

for each beat of data, but re-arbitration is not required. 45 transactions in the order of D first, E second, and F third. 

Having independent address and data tenures allows address There can be up to a selected number of outstanding 

pipelining (indicated in FIG. 1 by the fact that the data tenure master/slave pair transactions in the architecture at one time, 

begins before the address tenure ends) and split-bus trans- In one preferred embodiment, this selected number is three 

actions to be implemented at the system level. Address outstanding pair transactions. As a result, in the foregoing 

pipelining allows new address bus transactions to begin 50 architecture, an expansion bridge may concurrently have 

before the current data bus transaction has finished by one outstanding slave transaction to it and one outstanding 

overlapping the data bus tenure associated with a previous master transaction from it. Although ordered masters and 

address bus tenure with one or more successive address slaves, as opposed to unordered masters and slaves, provide 

tenures. Split-bus transaction capability allows the address an overall simplification to system architecture, they can 

bus and data bus to have different masters at the same time. 55 lead t0 deadlocks when there are conflicting completion 

For clarity, the basic functions of address and data tenures dependencies, 

will be discussed in somewhat greater detail. Deadlock occurs in a computer system when one resource 

In the case of address tenure, during address arbitration, cannot complete an access to another resource, and the 

address bus arbitration signals are used to gain mastership of access blocks other resources from performing transactions 

the address bus. Assuming the CPU to be the bus master, it 60 on ^ c bus - Livelock occurs in a computer system when one 

then transfers the address on the address bus during the resource cannot complete an access to another resource, 

address transfer phase. The address signals, together with does not block resources from performing transactions on 

certain transfer attribute signals discussed in greater detail me DUS > DUt 00 forward progress can be made due to the 

hereinafter, control the address transfer. After the address resource's inability to complete its access, 

transfer phase, the system uses the address termination 65 Due to the plethora of design methodologies and imple- 

phase to signal that the address tenure is complete or that it mentations utilized by expansion card vendors, systems are 

must be repeated. most prone to deadlocks and live locks when there is an 




5,996,036 

3 4 

expansion bridge in the system. Some potential deadlocks devices and for generating for each slave device a mullibit 

may be detected and prevented at the bridge level; however, signal which when active identifies a transaction within the 

other pieces of the overall solution may need to be imple- transaction queue of the slave device, 
mented at a higher level in system arbitration. 

The main reason that a deadlock or livelock occurs is that 5 <£) ^ DESCRIPTION OF THE DRAWINGS 

each of two different resources that communicate with each ^ present invention may be further understood from the 

other assumes that it has top priority in the system. following description in conjunction with the appended 

Unfortunately, when they communicate with each other this drawing. In the drawing: 

causes a conflict, and if one does not back off its access, the . .„ . . , f 

end result is deadlock or livelock. » . f , a dla & ram Crating overlapping enures for a 

single-beat transfer on a conventional MPC601 bus; 

In the architecture of certain Power PC computers of the ~ . . , . c 

... . . . . . r Ann . . FIG. 2 is a system-level block diagram of a computer 

assignee, the top priority bus is known as the ARBus; it is . - . . 4 . & . , r 

.u L « . system in which the present invention may be used; 

the one bus assumed to never have to back off an access. r J 

However, there may be a need for the ARBus to communi- FIG 3 15 a block diagram of the memory controller 300 

cate with an ISA bus behind an expansion bridge. As history 15 of FIG. 2; 

recalls, the ISA bus design assumed that any initiated access FIG. 4 is a timing diagram showing conventional usage of 

would complete; therefore, an ISA master would not have to me MPC601 bus; 

back off its access. Therein lies the problem. The Power PC FIG. 5 is a timing diagram showing usage of the ARBus 

architecture, in one instance, chose the ARBus to be the bus (a superset of the MPC601 bus) in the high-performance 

to not back off, and the PC-world chose the ISA bus to be 20 computer architecture of FIG. 2; 

the bus to not back off. This conflict of interest could result FIG. 6 Is a block.diaeram of the arbiter 600 of FIG- 3; r cP - 

in deadlock. ™ ./feT^frffi ' 

FIG. 7 is-a-b16ck diagrarmofrnB exparMonbridge vOOof . / A* 3 

In another instance, the Power PC architecture may incor- pjQ 2* A- UZZJUj 

porate a PCI bus-to-PCI bus ("PCI2PCI") bridge having an nr ' Q .|i . . a ,« i • u- u add 

. , , , . , , . * ? t 25 FIG. 8 illustrates a deadlock m which an ARBus master 

interlocking behavior that disallows access to its slave port _ , _ . . . f „ Ar>T> 

• j c .« n^^n^ . • . ... . read of an expansion bridge is followed by an ARBus master 

on one side of the PC12PCI bridge while its master on the ^ mQJJ ^ 0 . 

same side of the PCI2PCI bridge has a transaction to rea ° memor y» 

perform. This behavior also means that the PCI2PCI bridge FIG 9 illustrates a deadlock in which an ARBus master 

assumes that it does not have to be backed off, and any readof an expansion bndge is followed by an ARBus master 

communication between the ARBus and a target behind the 12 hlt or allocate operation; 

PCI2PCI bridge could result in deadlock. FIG. 10 illustrates a deadlock in which a processor read 

Although decoupling the address and data buses in a of an expansion bridge is followed by a processor write to 

computer system enables bus utilization to be greatly tnal ex P ansion bridge; 

increased, it would be desirable to further increase bus 35 FIG. 11 illustrates a deadlock in which a Bus Grant signal 

utilization beyond what can reasonably be achieved in a and an Address Retry signal occur concurrently; 

system having both ordered masters and ordered slaves. FIG. 12 illustrates a deadlock in which a Bus Request 

Especially desirable would be a computer architecture in signal and an Address Retry signal occur concurrently; 

which bus utilization is increased and in which deadlocks FIG. 13 illustrates a deadlock in which expansion bridges 

are more readily avoided. 40 read each otner concurrently; 

SUMMARY OF THE INVENTION ^ illustrates a deadlock in which one master 

attempts to read both expansion bridges; 

A mechanism is provided for reordering bus transactions j ffl 

to increase bus utilization m a computer system in which a . # !. . . . Z ^ 

,. . ...... • , , . i reads a tar g et behind an opposite expansion bndge; 

split-transaction bus is bndged to a single-envelope bus. In 45 r & 

one embodiment, both masters and slaves are ordered, nG 16 Pirates a deadlock in which a PCI bus master 
simplifying implementation. In another embodiment, the read S ets stuck behmd a P osted PCI bus masler ^ 
system is more loosely coupled with only masters being FIG. 17 illustrates a deadlock in which the ARBus trans- 
ordered. Greater bus utilization is thereby achieved. In action limit is hit, and accesses cannot complete; 
accordance with one embodiment of the invention, a queu- 50 FIG. 18 illustrates a deadlock in which one expansion 
ing structure includes multiple master queues and multiple bridge, with an outstanding ARBus read, accepts a read from 
slave queues. The queuing structure receives bus grant another expansion bridge; 

signals and respective slave acknowledge signals from FIG. 19 is a block diagram of another embodiment of the 

respective slave devices. Each time an address bus grant is arbiter 600 of FIG. 3; 

issued a record is entered in the queuing structure, the record 55 na 2 0 is a block diagram showing the input and output 

comprising a first entry in a master queue identified by the signa is of the a^m^ 60 3' of FIG. 19; 

address bus grant signals, and a second entry in a slave nG n ^ a ^ ^ showin ^ ^ and 

queue identified by the slave acknowledge signals. The first s{ k of ^ 603 , of nG ft {q ^ ^ 

entry identifies a target slave device m accordance with the . . . . t . , f 

slave acknowledge signals, and the second entry identifies 60 . n ^ ^ * a A bl °' k l a f^, ™ S ?« ^ 

an originating master device in accordance with the address sl S nals of lhe ArbDatSM 604' of FIG. 19; 

bus grant signals. A matching circuit is responsive to queue ¥lG - 23 * a bIock diagram of a bit filter portion of the 

entries from the queuing structure for producing match bits ArbDatSM 604' of FIG. 19; 

identifying selected records the first entry of which is at the FIG. 24 is a block diagram showing the input and output 

head of a master queue. A data arbitration circuit is respon- 65 signals of the ArbDatSM 604' of FIG. 19 in greater detail; 

sive to the match bits and to queue entries from the queuing FIG. 25 is a block diagram showing the input and output 

structure for generating data bus grant signals for the master signals of the ARtryGen block 613' of FIG. 19; and 
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FIG. 26 is a block diagram showing the input and output logic. Both the main memory controller 302 and the cache/ 

signals of the ARtryGen bock 613' of FIG. 19 in greater ROM controller 305 exchange control signals with the 

detail; arbiter 600, which executes overall control of the memory 

controller 300 and which is more particularly the subject of 

^D^. DETAILED DESCRIPTION OF THE 5 the following description. 

INVENTION ^ afbiter 600 includes a register file ( no t shown) that 

In the following description, the system architecture of a mav °e written and read by the CPU 203 across the register 
computer system in which the present invention may be used data DUS 217. The register file includes, in addition to 
will first be described, including a description of the numerous base address registers, various ID, configuration 
MPC601 bus, the ARBus, which is a superset of the 10 and timing registers. The particulars of these registers are 
MPC601 bus, a system arbiter and an expansion bridge. not essential to an understanding of the present invention 
Deadlock avoidance will then be described, beginning with and wil1 not •» further described. The arbiter 600 inputs 
a description of the types of deadlocks and livelocks that various control signals from and outputs various control 
may occur in the system, followed by a description of signals to a control bus 309. Some of the control signals 
specific deadlock and livelock situations for both a system 15 carried by the control bus 309 are part of the conventional 
having a single expansion bridge and a system having two PowerPC 601 microprocessor interface. The majority of the 
or more expansion bridges. Rules will be identified for signals carried by the control bus 309, however, are side- 
avoiding deadlock. These rules will then be summarized, band information signals used in accordance with the 
both for the case of a single expansion bridge and for the present invention to independently control the address bus 
case of two or more expansion bridges. Finally, the manner 20 206 and the data bus 205. 

in which the rules are implemented in the system will be Prior to describing in detail the manner in which these 

described. side-band information signals are used to decouple the 

Referring now to FIG. 2, the present invention may be address bus 206 and the data bus 205, it will be useful to 

* used in a computer system of the type shown. A CPU 203 consider what is termed herein conventional usage of the 

(for example a Power PC 601 microprocessor) is connected 25 PowerPC 601 microprocessor interface, 

to a system bus 204, including a data bus 205, an address bus As shown in FIG. 1, address tenure and data tenure both 

206, and a control bus (not shown). A memory subsystem have arbitration, transfer and termination phases. Each of 

208 includes, in the illustrated embodiment, a main memory these phases involves the exchange of respective handshak- 

209, a read-only memory 211, and a level-two cache 3Q ing signals. Referring to FIG. 4, the handshaking signals that 

memory 212. The CPU 203, through the system bus 204, is characterize the address arbitration phase are a bus request 

connected directly to the level-two cache memory 212. The signal BR and a bus grant signal BG. The bus request signal 

CPU 203 is connected indirectly to the main memory 209 BR is an output signal of the CPU 203. The bus grant signal 

and the read-only memory 211, through a datapath circuit is an input signal of the CPU 203 and is output by the arbiter 

221 and a memory controller 300. In general, the datapath 600. Both the bus request signal BR and the bus grant signal 

circuit 221 provides for 64- or 128-bit reads from and writes 35 BG relate to the address bus 206. When the CPU 203 has 

to memory, in either big-endian or little-endian mode. The received the bus grant signal BG, it is free to enter the 

memory controller 300 controls the various memory devices address transfer phase. 

within the memory subsystem 208 in response to signals on During the address transfer phase, a transfer start signal 

the system bus 204 and, in particular, provides address and TS is asserted by the CPU 203 when the CPU 203 begins to 

control signals (i.e., RAS and CAS) to the main memory drive the address bus 206. The address is decoded by a slave 

209. The datapath circuit 221 and the memory controller 300 device as belonging to that address, i.e., falling within the 

are connected by a register data bus 217. device's assigned address space. During the address termi- 

Also shown is an optional secondary processor 218 nation phase, the slave device asserts the address acknowl- 

which, like the CPU 203, may be a Power PC 601 micro- 45 edge signal AACK after it has sampled the address on the 

processor for example. address bus 206. 

The system bus 204 is also connected to an expansion bus During the address transfer phase, certain transfer 

bridge 219 (possibly more than one) and, optionally, a video attribute signals are used indicate the nature of transaction, 

bus bridge 220. In a preferred embodiment, the system bus . including whether the transaction is an address-only trans- 

204 is a superset of the conventional Power PC 601 micro- 50 action. Assuming that the transaction is not, then the transfer 

processor interface referred to herein as the Apple RISC start signal TS is treated by the arbiter 600 as an implicit data 

Bus, or ARBus. An expansion bus connected to the expan- bus request, starting the data arbitration phase. Following 

sion bus bridge 219 may be a standard PCI bus. Likewise, assertion of the acknowledge signal AACK, a data bus grant 

a video bus connected to the video bus bridge 220 may be signal DBG is asserted by the arbiter 600 once the data bus 

a PCI-like bus. 55 205 is available for use by the CPU 203. The CPU 203 may 

Referring to FIG. 3, the memory subsystem 208 including 1060 be & n the data transfer phase on the next cycle by 

the memory controller 300 of FIG. 2 are shown in greater driving the data bus 205. During a subsequent data termi- 

detail, with particular emphasis on the various signals input Dalion P Dase > tne s!ave device asserts a transfer acknowledge 

to and output from the memory controller 300. The memory si 8 nal TA afier il nas sampled the data on the data bus 205. 

controller 300 includes a main memory controller 302, a 60 The foregoing sequence of operations is repeated for a 

cache/ROM controller 305, and an arbiter 600. The main second subsequent transaction. In FIG. 4, the transaction to 

memory controller 302 produces address and control signals which address and data information pertain is indicated in 

for the main memory 209 and includes a DRAM sequencer parentheses, i.e., transaction (1) and transaction (2). 

303 and certain memory address logic. The cache/ROM Note that in FIG. 4, address tenures and data tenures, 

controller 305 produces control signals for the level-two 65 although they may be pipelined, are tightly ordered. That is, 

cache memory 212 and the read-only memory 211 and data bus tenure on the system is granted in the same order 

includes a cache/ROM sequencer 306 and certain cache as address tenure is granted even if the address tenures are 
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granted to different masters. In precise terms, if TS(n) is for 
Master A and TS(n+l) is for Master B, then DBG(n) will be 
for Master A and DBG(n+l) will be for Master B. 

This tight ordering of the conventional MPC601 bus may 
result in considerable system performance degradation, 5 
especially as bus speed increases. A read transaction to an 
expansion-bus device, for example, will typically be high- 
latency as compared to a main-memory read transaction. 
Tight ordering of address and data tenures results in such 
latency impacting the data bus. That is, even though another 10 
transaction might be ready to use the data bus first, during 
the latency period, it cannot because of the tight ordering of 
address and data tenures. If a system is to handle information 
streams having real-time constraints, such as video streams, 
it is important to ensure that the data bus is not unavailable 15 
for use during substantial periods of time; otherwise real- 
time deadlines may be missed, resulting in objectional 
artifacts during presentation. 

The architecture of the computer system of FIG. 2 
decouples address and data tenures such that data bus 20 
utilization is increased. This increase in data bus utilization 
allows for higher real-time performance to be achieved. In 
particular, the present invention allows for a true split-bus 
architecture with ordered slaves and ordered masters. 
"Ordered," in one usage, means each master and each slave 25 
has its own independent FIFO structure supporting 
"ordered" service to transactions posted to it. If a slave 
receives three transactions A, B, and C, the it will respond 
to A first, B second, and C third. If a master performs 
transactions D, E, and F, then it expects servicing of those 30 
transactions in the order of D first, E second, and F third. In 
one embodiment, there can be up to three outstanding 
master/slave pair transactions at one time. 

Referring briefly again to FIG. 3, the side-band informa- 35 
tion signals carried by the control bus 309 are side-band 
information signals used to decouple the address bus 206 
and the data bus 205. These side-band information signals 
include, in addition to the bus request signal BR, the bus 
grant signal BG and the data bus grant signal DBG of FIG. 
4, corresponding signal for each master besides the CPU 
203. 

In one embodiment, the system includes, besides the CPU 
203, four additional masters for up to a total of five masters: 
the CPU 203, the secondary processor 218 (if present), the 45 
expansion bus bridge 219, one additional expansion bus 
bridge (if present), and the video bus bridge 220 (if present). 
The control bus 309 therefore carries five bus request signals 
BR[0:4], five bus grant signals BG[0:4], and five data bus 
grant signals DBG[0:4]. 50 

In the same embodiment, the system includes six slaves: 
the expansion bus bridge 219 (also a master), the additional 
expansion bus bridge (also a master, if present), the video 
bus bridge 220 (also a master, if present), the main memory 
209, the read-only memory 211, and memory controller 55 
registers accessible via the register data bus 217. For each 
slave, the control bus 309 carries three signals: a slave 
acknowledge signal SACK, a read data available signal 
RDDA, and a source- or sink-data signal SSD. The control 
bus 309 therefore carries six slave acknowledge signals 50 
SACK[0:5], six read data acknowledge signals RDDA[0:5], 
and six source- or sink-data signals SSD[0:5]. 

The manner in which the foregoing signals are used to 
decouple address tenures and data tenure may be appreciated 
with reference to FIG. 5. For simplicity, the address arbi- 65 
tration phase has not been illustrated. The address transfer 
phase is essentially the same as in the conventional case. The 
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address termination phase, however, differs. The addressed 
slave asserts the AACK signal in the conventional manner, 
the AACK signal being used by the master. In parallel with 
AACK, the addressed slave generates a SACK signal for use 
by the arbiter 600. The arbiter uses this information about 
which slave has acknowledged in order to reorder transac- 
tions on the system bus 204. 

In the data arbitration phase, the data bus is granted to 
masters based on a priority ordering of masters, and is 
granted to slaves based in part on the priority of the master 
of the transaction and in part on the availability of data from 
the slave. What may be considered in effect two sets of grant 
signals are therefore defined, DBG[0:#Masters-l] for mas- 
ters and SSD[0:#Slaves-l] for slaves. 

Assume, for example, that in FIG. 5 the first transaction 
is a read by the CPU 203 from the expansion bus bridge 219 
and that the second and third transactions are writes to 
memory from the video bus bridge 220. In general, video 
transactions will be assigned a higher priority than transac- 
tions by the CPU 203 because of the real-time requirements 
of video transactions. Data bus grant signals are therefore 
issued to video bus bridge 220 for the first video transaction 
(2), which proceeds through the data transfer phase, and the 
second video transaction (n), which also proceeds through 
the data transfer phase. The CPU 203 will not be issued a 
data bus grant signal for its read from the expansion bus 
bridge 219 until a read data acknowledge signal has been 
returned to the arbiter 600 from the expansion bus bridge 
219. Then, the CPU 203 will be issued a data bus grant 
signal for its read and the expansion bus bridge 219 will 
simultaneously be issued a corresponding slave source-data 
signal causing it to present its data on the data bus 205 to be 
sampled by the CPU 203. 

As may be appreciated from the foregoing description, the 
data arbitration phase in accordance with the present inven- 
tion is very different than in the conventional case. This 
different manner of operation allows address and data ten- 
ures to be decoupled, increasing utilization of the data bus. 
The data transfer and data termination phases, however, are 
essentially the same as in the conventional case. 

Transaction reordering is controlled by the arbiter 600. 
The general characteristics of the arbiter 600 will first be 
described, after which the arbiter 600 will be described in 
greater detail. 

The basic behavior that the arbiter 600 guarantees is as 
follows: 

Any given ARBus master has its own address and data 
tenures strictly ordered. That is, DBG(n) always cor- 
responds to TS(n) and for a set of TS(n) and TS(n+l), 
DBG(n) will always occur before DBG(n+l). 

Any given ARBus slave has its own data tenures strictly 
ordered. That is, SSD(n) always corresponds to TS(n) 
and for a set of TS(n) and TS(n+l), SSD(n) will always 
occur before SSD(n+l). 

Data bus tenure is not necessarily granted on the ARBus 
in the same order as address tenure is granted if the 
address tenures are granted to different masters. That is, 
if TS(n) is for Master A and TS(n+l) is for Master B, 
DBG(n) may be for Master B and therefore DBG(n+l) 
for Master A. 

In the illustrated embodiment, the arbiter 600 supports 
five logical masters. The five masters arbitrate for use of the 
bus in accordance with a fixed priority as follows: the video 
bus bridge 220, the expansion bus bridge 219, an additional 
expansion bus bridge (if present), the CPU 203, and the 
secondary processor 218. By giving highest priority to the 
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video bus bridge 220, the arbiter 600 allows the video bus Based on the foregoing input signals, the arbiter multi- 
bridge 220 to "hog" the ARBus. plexer 603 produces a slave match vector SlvMatch and a 

The arbiter 600 may optionally "park" the CPU 203 or the slave read ready vector SlvRdReady. The slave match vector 

video bus bridge 220 on the ARBus by asserting the appro- SlvMatch designates those masters'finding matching slaves, 

priate BG wire during idle bus cycles. The default mode of 5 j. e>> s i ave s expecting to next respond to transactions from 

operation is to park the most recent master. those respective masters. The slave read ready vector SlvR- 

Address bus arbitration occurs m every cycle that an dReady identifies, of those masters, which have slaves that 

address tenure is not active. Masters assert their individual afe actuaH read tQ mm data The slave match vect0f 

bus request signals BR) to the arbiter 600 to signal a request slvMatch and lhe slave read ready vector SlvRdReady are 

for service^ The arbiter 600 signals the mas e, - which has data bus ^ machine ^ 

won the arbitration by asserting bus grant (BG). Masters that \™ o*^^ < « « <> ■ L £M 

i_ • i c * .to i The SACK vectors at the head of the master queues 601 

have BG asserted in a given cycle are free to assert TS and " t , ' , - , AJ r „ 

lk _ n ■ .u.. „„ vt i are also input to the data bus arbiter state machine 604. The 

therefore start a transaction in the next cycle. , \ , , , , . , 

The arbiter 600 controls the use of the data signals as a data bus arblter state machine 604 determines which trans- 
function of the address and the availability of read data. If action * read y t0 S° b V examining the bits of the SlvMatch 
a given ARBus address receives an AACK, the arbiter 600, 15 vector m pnonty order and, it a bit indicates a matching 
by sampling the SACK signals, knows which slave will master/slave pair, determining further whether either the 
accept write data or will return read data. A slave that asserts transaction is a write transaction (by examining the RdAVr 
AACK for a write transaction gives implicit permission to bits at the front master queue entries) or the corresponding 
the arbiter 600 to grant the data bus to the master and allow bit in the SlvRdReady vector is set, indicating that the slave 
it to assert the associated write data. Slaves must assert 20 is ready to source data. In Verilog notation, the data bus 
RDDA when requested return read data is available. arbiter state machine 604 computes a vector TransReady as 

The arbiter 600 grants the data bus to a selected master via follows: 

the assertion of DBG (Data Bus Grant) and indicates to the TransReady[0:4] = SlvMatch[0:4]& 

slave that data is to be asserted or accepted via the assertion ({5{Write}}}|SlvRdReady[0:4]) Based on the computed 

of SSD (Source or Sink Data). 25 TransReady vector, the data bus arbiter state machine 604 

Transactions which do not involve a data transfer asserts a corresponding one of the data bus grant signal 

(Address-Only transactions) are typically generated by the DBG. The data bus arbiter state machine 604 also asserts a 

CPU 203 or the secondary processor 218 and are simply corresponding one of the source-or-sink-data signals SSD, 

acknowledged (AACK asserted) by the arbiter 600. in accordance with the SACK vector at the front of the 

Referring now to FIG. 6, the arbiter 600 will be described 30 winning master queue, 

in greater detail. The arbiter 600 includes master queues 601, Operation of the arbiter 600 may be further understood 

one for each master in the system, and slave queues 602, one from the following illustrative examples, 

for each slave in the system. Each of the master queues 601 To take a relatively simple example, assume that Master 

are connected at their respective data inputs to a SACK 1 (the expansion bus bridge 219) issues a read transaction to 

vector composed of the slave acknowledge signals SACK of 35 Slave 3 (the video bus bridge 220). Slave 3, when it is ready 

each of the slaves, in addition to a Rd/Wr signal. Hereinafter, to service the transaction, asserts the AACK signal on the 

the term "SACK vector" will be understood to mean signals ARBus and, at the same time, generates a SACK signal to 

including the slave acknowledge signals SACK of each of the arbiter 600 identifying Slave 3. When the arbiter 600 

the slaves and the Rd/Wr signal. Each of the slave queues receives the AACK signal, the SACK vector is pushed onto 

602 are connected at their respective data inputs to a BG 40 one of the master queues 601 based on the BG vector. At the 

vector composed of the bus grant signals BG of each of the same time, the SACK vector is pushed onto one of the 

masters. (In more precise terms, the BG vector is the master queues 601 based on the BG vector. Assuming that 

physical bus grant signals sampled in the cycle that the TS no other transactions are presently queued, a SACK vector 

signal is asserted.) The bus grant signals BG are produced by value representing Slave 3 (for example blllOll) will 

an address bus arbiter state machine 605 in response to the 45 appear at the head of the one of the master queues 601 for 

bus request signals BR of each of the masters. Master 1, and a BG vector value representing Master 1 (for 

Each time the address acknowledge signal AACK is example blOlll) will appear at the head of the one of the 

presented on the system bus 204, the master queues 601 and slave queues 602 for Slave 3. The arbiter multiplexer 603 

the slave queues 602 are updated by pushing the SACK will therefore cause the SlvMatch vector to have a value 

vector onto one (and only one) of the master queues 601 and 50 indicating a match for Master 1 (for example bOlOOO). When 

pushing the BG vector onto one (and only one) of the slave Slave 3 is ready with read data, it will assert its RDDA 

queues 602. In particular, the SACK vector is pushed onto signal, in response to which the arbiter multiplexer 603 will 

one of the master queues 601 identified by the BG vector, cause the SlvRdReady vector to have a value indicating the 

and the BG vector is pushed onto one of the slave queues readiness of Slave 3 (for example bOOlOO). If no other 

602 identified by the SACK vector. 55 transactions having higher priority have in the meantime 
The iSACK vectors at the heads of the master queues 601 become ready to go, the data bus arbiter state machine 604 

and the BG vectors at the heads of the slave queues 602 are will then issue a data bus grant signal DBG to Master 1 and 

input to an arbiter multiplexer 603. The arbiter multiplexer a sink/source data signal SSD to Slave 3, and the data 

603 looks at the SACK vectors at the head of the master transfer phase of the transaction will proceed. 

queues 601 and determines which of the slave queues 602 60 To take another, more complex example, assume that after 

designated by the SACK vectors have at their heads a BG Master 1 has issued the foregoing transaction request 

vector that designates the reciprocal one of the master (shown below as Transaction 1) but before Slave 3 has 

queues 601. On the next data tenure of the masters for which responded with an RDDA signal, a series of further trans- 

this condition is satisfied, data will be sourced from the actions is issued, in accordance with the following chrono- 

corresponding slave. The arbiter multiplexer 603 also 65 logical sequence: 

receives a read-ready vector RDDA composed of the read 1. Master 1 Rd Slave 3 

data acknowledge signals RDDA of each of the slaves. 2. Master 3 Wr Slave 3 
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BUS 



BUS CHARACTERISTIC ARBUS 



PCI BUS 



ADDRESS/DATA TENURES Full split transaction Single envelope 



(pended) 
ENDIANESS Big endian 

CYCLE TYPES One cycle type 

TRANSACTION LENGTHS Fixed (3.2-byte) 
burst length . 



BUS SPEED 



Up to 50 MHz 



(non-pended) 
Little endian 
Many cycle types 
Arbitrary length 
transactions with 
byte-enabled writes. 
33 MHz 
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3. Master 3 Wr Slave 0 

4. Master 4 Rd Slave 1 

5. Master 2 Wr Slave 4 
Note that transactions 1 and 2 both involve Slave 3, and 

transactions 2 and 3 both involve Master 3. Because masters 
and slaves are ordered, data dependencies are created. That 
is, transaction 2 cannot complete until transaction 1 has 
completed. Similarly, transaction 3 cannot complete until 
transaction 2 has completed. Transactions 4 and 5, on the 
other hand, have no data dependencies. Transaction 4 is a 
read from Master 4 (CPU 1) to Slave 1 (ROM). In the case 
of ROM and RAM, because read latency is minimal and is 
know in advance, the RDDA signals for ROM and RAM arc 
tied asserted. 

Transaction 2, Master 3*s write of Slave 3, is queued up 
behind Master l's read of Slave 3. Transaction 3, Master 3's 
write of Slave 0, is queued up behind Master 3's write of 
Slave 3. When transaction 4 is queued, there are matching 
queue entries at the head of the master and slave queues for 
transactions 1 and 4. Transaction 1, however, is a read 
transaction and is not allowed to proceed until an RDDA is 20 
received from Slave 3. 

Therefore, the arbiter 600 first grants the data bus to 
Master 4 and Slave 1 for transaction 4. When transaction 5 
is queued, there are matching queue entries at the head of the 
master and slave queues for transactions 1 and 5. Assume, 
however, that an RDDA has still not been received from 
Slave 3. The arbiter 600 will then grant the data bus to 
Master 2 and Slave 4 for transaction 5. 

Assume now that an RDDA is received from Slave 3. 
Transactions 1, 2 and 3 will then, in that order, be granted the 30 
bus and will complete. In the foregoing example, whereas 
the address order of the transactions is 1, 2, 3, 4, 5, the data 
order is 4, 5, 1, 2, 3. 

When the system is totally idle, i.e., the data bus is not 
busy and all queues are empty, a CPU memory read trans- 
action is executed immediately without queuing the trans- 
action. 

The expansion bridge responds to transactions on the 
ARBus and PCI Bus and forwards them to the "other" bus 
appropriately. The primary function of the expansion bridge 
is to map transactions from one bus to the other. The job of 
the expansion bridge to transfer data between the ARBus 
and the PCI Bus is complicated by the fact that the ARBus 
and the PCI Bus are very different in a number of respects 
as shown in the following table: 

TABLE 1 



25 



35 



40 



45 



50 



55 



The PowerPC architecture and the ARBus do not "natu- 60 
rally" generate many types of cycles that are required by the 
PCI specification. These unique PCI Bus cycles are included 
in the PCI specification to provide backwards compatibility 
for x86/ISA/IBM PC- AT cards and software. The expansion 
bridge provides facilities for generating PCI Bus configu- 65 
ration cycles, I/O cycles and PCI "Special Cycles"/ 
"Interrupt Acknowledge" via special address spaces. 



Referring now to FIG. 7, the expansion bridge 700 will be 
described in greater detail. The expansion bridge is con- 
structed with two main state machines for the ARBus and 
PCI Bus. The two main state machines actually consist of a 
number of smaller sub-state machines. These state machines 
operate in different clock domains and require that hand- 
shake signals be synchronized. Transactions passed between 
the ARBus and the PCI Bus are staged in a large packet- 
buffer structure. Data endian conversion is performed on the 
ARBus side of the packet buffer with data being stored in the 
packet buffer in PCI Bus Little Endian format. Address 
endian swizzling is performed on the master side of a 
transaction. For a master cycle to the PCI Bus from the 
ARBus, the address swizzling occurs on the ARBus side. 
For a master cycle to the ARBus from the PCI Bus, the 
address swizzling occurs on the PCI Bus side. 

As explained previously, systems are most prone to dead- 
locks and livelocks when there is an expansion bridge in the 
system. In the description that follows, a deadlock will be 
introduced, together with its Lockup type (A, B, or C as 
described below), a solution for the deadlock, and where in 
the system the deadlock prevention logic preferably resides. 
Deadlock prevention rules assume a starting point behavior 
in which the expansion bridge allows concurrent reads 
through the bridge, and the ARBus arbiter performs the 
DBWO* protocol as necessary. The DBWO* protocol 
allows the Processor to re-order a write data phase around a 
read data phase for snoop pushes. 

An entire class of deadlocks and livelocks is related to the 
PCI Bus being stalled during reads. During a read, the PCI 
Bus can potentially remain stalled for micro-seconds at a 
time when the target of the read is on the other side of a 
bridge. For instance, a Master on PCI Bus 1 wants to read 
from a target behind a PCI2PCI bridge on PCI Bus 2. In this 
case the master incurs the latency of three bridges (a first 
expansion bus bridge, a second expansion bus bridge, and a 
PCI2PCI bridge) before actually reaching the target, and no 
other transactions can occur on PCI Bus 1 as long as the read 
is stalling the bus. If other transactions from the ARBus were 
able to get access to the PCI Bus and complete, then the class 
of deadlocks related to conflicting completion orders would 
disappear. This type of lockup is referred to herein as Type-A 
LockUp. 

Another class of deadlocks and livelocks is related to the 
ISA bus and PCI2PGI bridge behavior. When an ARBus read 
occurs to an ISA bus or a target behind a PCI2PCI bridge, 
it has no way of knowing whether it will complete or be 
blocked. A "block" can occur for the ISA bus if there is an 
ISA bus master already on the ISA bus with a pending 
transaction; this transaction may or may not require ARBus 
access. A "block" can also occur for the PCI2PCI bridge if 
the bridge has writes posted to it that it must perform on the 
host side of the PCI2PCI bridge before completing the read. 
In either of these two cases, there is an ARBus master that 
will wait forever for its read to either the ISA bus or 
PCI2PCI bridge to complete. If anything "blocks" the ISA 
bus or PCI2PCI bridge from completing its non-back-offable 
access, deadlock will occur. This type of lockup is referred 
to herein as Type-B Lockup. 

A third class of deadlocks and livelocks is related to the 
ARBus arbiter being fixed priority, and to cross- 
communication problems between devices on the bus who 
are both masters and slaves. Lower priority masters can be 
starved from gaining ownership of the ARBus when follow- 
ing the generic ARBus rules set forth for behavior following 
an ARBus ARTRY*. If in addition, the lower priority master 
is unable to accept transactions as a slave, deadlocks or 
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livelocks can occur. This type of lockup is referred to herein 
as Type-C LockUp. 

Deadlock avoidance is complicated by the fact that in 
some systems there may be more than one expansion bridge. 
Hence, deadlocks in a true split bus architecture having only 5 
a single expansion bridge connected to the ARBus will be 
considered first, followed by a consideration of deadlocks in 
a true split bus architecture having two expansion bridges. 
Systems having more than two expansion bridges will not be 
considered, although similar deadlock avoidance principles 10 
may be applied to such systems. 

Various deadlocks can occur with a single expansion 
bridge in a system implemented with a split bus (ARBus), 
ordered masters, ordered slaves, and utilizing a fixed priority 
arbitration scheme for the masters on the ARBus. These 15 
deadlocks can also occur in a dual expansion bridge system 
with the same characteristics, but only one expansion bridge 
need be involved to cause the deadlock. 

Referring to FIG. 8, deadlock may occur when an ARBus 
master read of an expansion bridge is followed by an ARBus 20 
master read to memory. A typical sequence of transactions is 
as follows: 

1. PCI Bus 1 Master initiates read of main memory, and 
stalls PCI Bus 1. 

2. Processor 1 reads target behind Expansion Bridge 1 25 
(Expansion Bridge 1 AAck*s without ARTRY*). 

3. Processor 1 reads main memory (Memory Controller 
AAck*s without ARTRY*). 

4. Expansion Bridge 1 forwards read of main memory ^ 
(Memory Controller AAck*s without ARTRY*). J 

Master Processor 1 has ordered itself: a) Expansion 
Bridge 1, b) Read main memory. Slave main memory has 
ordered itself: a) Read by Processor 1, b) Read by Expansion 
Bridge 1. PCI Bus 1 has an implied ordering of a) Read main 35 
memory, b) Read by Expansion Bridge. PCI Bus 1 is stalled 
by the read of main memory and will not get off the bus until 
the read has completed. In this case, the completion order of 
Master Processor 1 directly conflicts with completion order 
of PCI Bus 1. This is a Type-A LockUp. There are two ^ 
potential solutions: 1) Retry the Expansion Bridge 1 read of 
main memory, OR 2) Retry the Processor 1 read of main 
memory. For reasons described hereinafter, Solution 2 is 
preferred for ease of implementation. This deadlock is 
therefore avoided by having the ARBus arbiter prevent the 45 
Processor from reading main memory (via ARTRY*) fol- 
lowing the Processor's read of an expansion bridge. 

Referring to FIG. 9, deadlock may occur when an ARBus 
master read of an expansion bridge is followed by an ARBus 
master L2 hit or allocate operation. A typical sequence of 5Q 
transactions is as follows: 

1. PCI Bus 1 Master initiates read of main memory, and 
stalls PCI Bus 1. 

2. Master A reads target behind Expansion Bridge 1 
(Expansion Bridge 1 AAck*s without ARTRY*). 55 

3. Master A issues memory read causing the L2 (second 
level cache) to allocate the cache line. 

4. Expansion Bridge 1 must complete its read of main 
memory, but it cannot complete. 

Because the TAG SRAMs utilize a latch to capture the 60 
address from the main Address Bus during a TS_, no future 
TS_ can occur until the completion of the TAG update. The 
system arbiter prevents future TS_ events by deasserting all 
Bus Grants to Masters until the completion of the TAG 
update. Unfortunately, in the scenario described above, the 65 
TAG update will not complete until the PCI Bus Master A 
read of main memory has occurred. Master A has ordered 
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itself: a) Read of Expansion Bridge 1, b) Read of main 
memory. Expansion Bridge 1 has ordered itself: a) Read of 
main memory, b) Read by Master A. This is a Type-A 
LockUp. Since the TAG update must complete without 
future occurrences of TS_, the deadlock fix is to have the 
ARBus arbiter prevent an access by Master A that would 
cause a second level cache hit or allocate (via ARTRY*) 
following Master A's read of an expansion bridge. 

Referring to FIG. 10, deadlock may occur when a pro- 
cessor read of an expansion bridge is followed by a proces- 
sor write to that expansion bridge. A typical sequence of 
transactions is as follows: 

1. Processor 1 reads target behind PCI2PCI bridge behind 
Expansion Bridge 1. PCI2PCI bridge blocks read 
completion in order to flush posted write data to target 
upstream of PCI2PCI bridge. 

2. Processor 1 writes target behind Expansion Bridge 1. 

3. Expansion Bridge 1 write attempt to main memory 
causes Processor 1 to attempt Snoop Push. 

The PCI2PCI bridge has become interlocked, and must 
flush a posted write upstream of itself; in this case the write 
is headed toward the ARBus, and Expansion Bridge l's 
buffers are full and cannot currently accept the write. The 
first two outstanding transactions in this scenario are 1) 
Master Processor 1 has an outstanding read of Expansion 
Bridge 1, followed by 2) Master Processor 1 has an out- 
standing write to Expansion Bridge 1. The third attempted 
transaction is a write cycle from Expansion Bridge 1 to main 
memory. However, this write cycle is to copyback-cacheable 
space and causes a snoop hit in Processor l's cache. Pro- 
cessor 1 retries Expansion Bridge 1 's write cycle, but now 
needs to push the dirty cache line to main memory. However, 
at this point it is unable to push the dirty cache line due to 
its outstanding write to Expansion Bridge 1. With the use of 
DBWO*, Processor 1 could have re -ordered the snoop push 
write transaction around its outstanding read of Expansion 
Bridge 1 (transaction number 1). However, the MPC60x 
microprocessor is not capable of re-ordering the snoop push 
write transaction around its own outstanding write. This is a 
Type-B LockUp, caused by Processor l's inability to com- 
plete its read due to the PCI2PCI bridge's interlocking 
behavior. This deadlock is avoided by having the ARBus 
arbiter prevent the Processor from writing to an expansion 
bridge if it has an outstanding read of the expansion bridge. 
This will allow the Processor to perform the Snoop Push 
write transaction if required. 

There is a set of deadlocks that only occur with more than 
one an expansion bridge in a system implemented with a 
split bus (ARBus), ordered masters, ordered slaves, and 
utilizing a fixed priority arbitration scheme for the masters 
on the ARBus. In one particular system architecture, high to 
low priority is: 1) Video, 2) Expansion Bridge 1, 3) Expan- 
sion Bridge 2, 4) Processor 1, 5) Processor 2. Deadlock rules 
described previously also apply to a multiple expansion 
bridge environment. The following new rules are in addition 
to the previous rules. 

Referring to FIG. 11, deadlock may occur in the case of 
concurrent Bus Grant and Address Retry signals. A typical 
sequence of transactions is as follows: 

1. Expansion Bridge 1 attempts a write to Expansion 
Bridge 2 but Expansion Bridge 2 buffers are foil. 

2. Expansion Bridge 2 has a write to Expansion Bridge 1 
and received Bus Grant during Expansion Bridge 1 
cycle. 

3. Expansion Bridge 2 ARTRY*s Expansion Bridge 1 due 
to foil buffers. As per ARBus specification, Expansion 
Bridge 2 ignores its Bus Grant and does not take the 
ARBus. 
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4. As per ARBus specification, following ARTRY* both 
Expansion Bridge 1 and Expansion Bridge 2 de assert 
their Bus Requests for one clock. Both re-assert Bus 
Requests. Expansion Bridge 1 wins. The foregoing 
sequence of transactions is repeated indefinitely. 
Following ARBus protocol after an ARtry*, a master who 
has a Bus Grant ignores it. All masters must deassert their 
Bus Requests the clock following an ARtry*, and then 
re-assert them. In a fixed priority arbitration scheme, the 
higher priority master will win every time, and if it cannot 
complete its access, an ARBus livelock results. This is a 
Type-C Lockup, and is avoided by having the expansion 
bridge disregard the ARBus protocol, and take the address 
tenure if a Bus Grant occurs during an ARtry*. An expansion 
bridge can do this without adverse side-effects because it is 
not a snooping bus master. 

Referring to FIG. 12, deadlock may occur in the case of 
concurrent Bus Request and Address Retry signals. Atypical 
sequence of transactions is as follows: 

1. Video attempts a write to Expansion Bridge 1 but 
Expansion Bridge 1 buffers are full; 

2. Expansion Bridge 1 has its Bus Request asserted 
because it has a read of memory to perform, but Video, 
with multiple cycles to perform, keeps its Bus Request 
asserted. 

3. Expansion Bridge 1 ARTRY*s Video due to full 
buffers. As per ARBus specification, Expansion Bridge 
1 and Video deassert their bus requests the clock 
following ARTRY*. 

4. Video and Expansion Bridge 1 reassert the bus requests. 
Since Video has a fixed higher priority than Expansion 
Bridge 1, it constantly gets Bus Grant. The foregoing 
sequence of transactions is repeated indefinitely. 

Following ARBus protocol after an ARTRY*, all masters 
on the bus deassert their Bus Requests to give the Processor 
a guaranteed window being the only bus requestor. This 
guarantees that the Processor, who normally has lowest 
ARBus priority, acquires the bus next in order to complete 
a high priority transaction such as a Snoop Push. In this case, 
the ARBus protocol causes the lower priority expansion 40 
bridge to never receive a Bus Grant due to the higher priority 
Video requesting the ARBus to complete its access. Since 
the completion of the Video access is dependent on the 
expansion bridge freeing up some buffer space, and since the 
expansion bridge must get the ARBus to complete its access 45 
or receive an ARTRY* in order to free up PCI Bus 1 to free 
up buffer space for the Video write to come in, the expansion 
bridge effectively needs higher priority than Video this time. 
This is a Type-C LockUp, and is avoided by having an 
expansion bridge keep its Bus Request asserted the clock 50 
following an ARTRY* if it is the source of the ARTRY*. 
This is precisely the protocol the MP60X processor per- 
forms to effectively achieve a higher priority when neces- 
sary. 

Referring to FIG. 13, deadlock may occur in the case of 55 
expansion bridges reading each other concurrently. A typical 
sequence of transactions is as follows: 

1. A Master Behind Expansion Bridge 1 reads a target 
behind Expansion Bridge 2 (Expansion Bridge 2 
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accepts the read from the opposite expansion bridge. Neither 
of the accepted reads can complete because the buses they 
are attempting to get onto are stalled. At least one of the 
buses must free itself for this basic deadlock to be avoided; 
one expansion bridge must not accept the read, but must 
ARTRY* the read attempt to it. This is a Type-A LockUp, 
and is avoided by having an expansion bridge disallow a 
read of its slave while it has an outstanding master read 
tenure (AAck* without ARTRY*). Once the data bus grant 
is received corresponding to the address tenure, then the 
transaction is guaranteed to complete and slave reads can be 
accepted. 

Referring to FIG. 14, deadlock may occur in the case of 
one master attempting to read both expansion bridges. A 
typical sequence of transactions is as follows: 

1. PCI Bus 1 Master initiates read of target behind 
Expansion Bridge 2, and stalls PCI Bus 1. 

2. Processor 1 reads target behind Expansion Bridge 1 
(Expansion Bridge 1 AAck*s without ARTRY*). 

3. Processor 1 reads target behind Expansion Bridge 2 
(Expansion Bridge 2 AAck*s without ARTRY*). 

4. PCI Bus 1 Master's read of target behind Expansion 
Bridge 2 occurs on ARBus (Expansion Bridge 2 
AAck*s without ARTRY*). 

Master Processor 1 has ordered itself: a) Read Expansion 
Bridge 1, b) Read Expansion Bridge 2. Slave Expansion 
Bridge 2 has ordered itself: a) Read by Processor 1, b) Read 
by Expansion Bridge 1. Expansion Bridge 1 has implied 
ordering due to stalled PCI Bus of: a) Read of Expansion 
Bridge 2, b) Read by Processor 1. In this scenario, all three 
devices involved have conflicting completion orders. 
Although Processor l's read of the target behind Expansion 
Bridge 2 can complete on PCI Bus 2, it cannot complete on 
the ARBus until Processor l*s read of Expansion Bridge 1 
has completed. Expansion Bridge l's read of Expansion 
Bridge 2 must complete before Processor l's read of Expan- 
sion Bridge 1 can complete. Since Expansion Bridge 2 is 
ordered to deliver the response to Processor l's read before 
delivering the response to Expansion Bridge l's read, the 
deadlock results. This is a Type-A LockUp, and is avoided 
by preventing one master from reading both an expansion 
bridges. This prevents the response ordering dependencies 
for the master. 

Referring to FIG. 15, deadlock may occur in the case of 
an ISA bus master reading a target behind an opposite 
expansion bridge. A typical sequence of transactions is as 
follows: 

1. PCI Bus 2 Master reads ISA target behind Expansion 
Bridge 1, stalling PCI Bus 2 (Expansion Bridge 1 
AAck*s) 

2. ISA Master on ISA initiates read of target behind 
Expansion Bridge 2. ISA Master cannot be backed off. 

3. Expansion Bridge 1 forwards ISA Master's read to 
Expansion Bridge 2. Expansion Bridge 2 retries Expan- 
sion Bridge 1 because PCI Bus 2 Master read is 
outstanding. This occurs indefinitely. 

The fact that the master behind Expansion Bridge 2 got its 
read AAck*ed by Expansion Bridge 1 on the ARBus prior to 



AAck*s) stalling PCI Bus 1. The read remains out- eo the ISA bus master behind Expansion Bridge 1, implies that 



standing within Expansion Bridge 2. 

2. A Master Behind Expansion Bridge 2 reads a target 
behind Expansion Bridge 1 (Expansion Bridge 1 
AAck*s) stalling PCI Bus 2. The read remains out- 
standing within Expansion Bridge 1. 

This is the most basic deadlock case. Each expansion 
bridge has a stalled bus, and yet each expansion bridge 
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Expansion Bridge 2's completion order is: 1) Complete read 
to ISA bus behind Expansion Bridge 1, 2) Accept incoming 
read from Expansion Bridge 1 (or whomever). However, the 
ISA bus has initiated an access and will retry all accesses to 
it until its read of the target behind Expansion Bridge 2 has 
completed. The ISA bus completion order is: 1) Complete 
read to target behind Expansion Bridge 2, 2) Accept incom- 
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ing read from Expansion Bridge 2 (or whomever). These 
two masters have conflicting completion orders. Note that if 
PCI Bus 2 had not been stalled by its read and Expansion 
Bridge 2 could have accepted the read from Expansion 
Bridge 1, then all transactions would be able to complete. 
This is a Type-A (PCI Bus 2 stall) and Type-B LockUp (ISA 
bus block). The fix is to allow ISA bus master cards to 
communicate only with main memory or targets behind the 
same expansion bridge. For example, system software may 
remap accesses across the bridges to memory and complete 
transfers virtually. 

Referring to FIG. 16, deadlock may occur when a PCI bus 
master read gets stuck behind a posted PCI bus master write. 
A typical sequence of transactions is as follows: 

1. Three transactions: a) Processor 1 Reads Expansion 
Bridge 1, b) Processor 1 Reads Expansion Bridge 1, c) 
Processor 2 Reads Expansion Bridge 2. 

2. Meanwhile: a) Expansion Bridge 1 has a write trans- 
action destined for Expansion Bridge 2, and a PCI Bus 
Master on PCI Bus 1 issues a read of memory, stalling 
PCI Bus 1, b) Expansion Bridge 2 has a write trans- 
action destined for Expansion Bridge 1, and a PCI Bus 
Master on PCI Bus 2 issues a read of memory, stalling 
PCI Bus 2. 

The normal means to get a PCI Bus Master read to free up 
the PCI Bus is to retry a transaction from the PCI bus when 
it cannot be serviced. Normally, the PCI Bus Master read 
would propagate to the ARBus, attempt its cycle on the 
ARBus, and either complete or get an ARTRY*. In either 
event, it frees up the bus. For a high-performance 
architecture, concurrent reads are desired at all times. The 
scenario on both PCI buses is that they are stalled with reads 
heading to memory, but there are write transactions to the 
opposite expansion bridge in each expansion bridge which 
cannot complete (because the transaction limit has been 
reached). Since neither expansion bridge's ARBus master 
write transactions can complete their address tenure, their 
respective PCI Bus Master read tenures cannot gain access 
to the ARBus to complete or receive an ARTRY*. In this 
instance the PCI buses will remain stalled indefinitely. This 
is a Type-A LockUp, and is avoided by having an expansion 
bridge immediately retry PCI Bus master reads if it has a 
PCI Bus master write transaction queued up in front of it that 
has not completed. This will ensure that the PCI Bus master 
read has access to the ARBus to complete the access or 
receive an ARTRY*. 

Referring to FIG. 17, deadlock may occur when the 
ARBus transaction limit is hit, and accesses cannot com- 
plete. A typical sequence of transactions is as follows: 

1. Three transactions: a) Processor 1 Reads ISA target 
behind Expansion Bridge 1, b) Processor 2 Reads target 
behind Expansion Bridge 2, c) Expansion Bridge 2 
Reads target behind Expansion Bridge 1, stalling PCI 
Bus 2. 

2. Meanwhile: a) Expansion Bridge 1 has a write trans- 
action destined for Expansion Bridge 2, and b) an ISA 
Bus Master has initiated a read access of main memory 
on the ISA Bus. The ISA bus master cannot be backed 
off. This ISA bus master access blocks the Processor 1 
Read from completing. 

The fundamental problem with this scenario is that the 
transaction queue depths are limited to three transactions. If 
the depth were four, then the Expansion Bridge 1 write 
transaction destined for Expansion Bridge 2 could complete, 
allowing the ISA bus master read of main memory to 
complete, etc. Given that the transaction queue depths are 
limited to three transactions, the other two problems to note 
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are thai the PCI Bus Master on PCI Bus 2 has stalled its bus 
with the read of the target behind Expansion Bridge 1 and 
that the ISA bus master has stalled its ISA bus with the read 
of main memory. If either bus were not stalled, then either 
5 the Processor 2 read of the target behind Expansion Bridge 
2 would complete, or the Processor 1 read of the ISA target 
would complete. This is a Type-A (PCI Bus 2 stall) and 
Type-B LockUp (ISA bas block). Since neither the PCI Bus 
2 stall or the ISA bus block can be prevented, the deadlock 
is avoided by the ARBus arbiter to prevent Expansion 
Bridge 2 from reading Expansion Bridge 1 if Expansion 
Bridge 2 has an outstanding read. In general terms, if an 
expansion bridge-A has an outstanding ARBus Master's 
Slave Read, then the ARBus arbiter should prevent 
(ARTRY*) an expansion bridge-A from reading an expan- 
15 sion bridge-B until the outstanding read has completed. 
Referring to FIG. 18, deadlock may occur when one 
expansion bridge, with an outstanding ARBus read, accepts 
a read from another expansion bridge. A typical sequence of 
transactions is as follows: 
20 1. Expansion Bridge 1 accepts two ARBus to PCI Bus 1 
writes. Meanwhile, a PCI Bus Master on PCI Bus 1 has 
initiated a read access from a target behind Expansion 
Bridge 2. 

2. Expansion Bridge 2 accepts a read from Processor 2 to 
25 the PCI2PCI bridge, followed by a read from Expan- 
sion Bridge 1. Meanwhile, Expansion Bridge 2 also 
accepts two PCI Bus to Expansion Bridge 1 write 
cycles. 

3. The Processor 2 read of the PCI2PCI bridge causes the 
30 bridge to attempt to flush posted write data to main 

memory. Since all buffers are filled in the direction of 
PCI Bus 2 to PCI Bus 1, and PCI Bus 1 is stalled, the 
PCI2PCI bridge cannot flush its data. 
The problem with this scenario is that the two PCI buses 

35 have conflicting completion orders. Since Expansion Bridge 
2 AAck*cd Expansion Bridge 1's read, PCI Bus 1 has 
committed to completing the read before allowing any other 
accesses to occur, thereby stalling the PCI Bus. The Pro- 
cessor 2 read of the PCI2PCI bridge bas kicked off the 

40 interlocking behavior of the bridge. The PCI2PCI bridge 
will not service the read until it has completed its writes. 
Unfortunately, to complete its write, an access must occur on 
PCI Bus 1 to free up some buffer space. PCI Bus 2 won't 
service the read until it executes the write, and PCI Bus 1 

45 won't service the write until it completes the read. This is a 
Type-A (PCI Bus 1 stall) and Type-B LockUp (PCI2PCI 
bridge block). Since neither the PCI Bus 1 stall or the 
PCI2PCI bridge block can be prevented, the fix is for the 
ARBus arbiter to prevent Expansion Bridge 1 from reading 

50 Expansion Bridge 2 if Expansion Bridge 2 has an outstand- 
ing read. In general terms, if an expansion bridge has an 
outstanding ARBus master's Slave Read, then the ARBus 
arbiter should prevent (ARTRY*) another expansion bridge 
from reading that expansion bridge until the outstanding 

55 read has completed. 

The following summary is a compilation of the foregoing 
rules. Items below in italic text are deadlock avoidance rules 
for which an expansion bridge is responsible, and items 
below in plain text are deadlock avoidance rules for which 

60 the ARBus arbiter or processor bus arbiter is responsible. 
Al. The ARBus arbiter must prevent an ARBus master 
from reading main memory (via ARTRY*) if that master has 
an outstanding read of an expansion bridge. 

A2. The ARBus arbiter must prevent an access by an 

65 ARBus master that would cause a second level cache hit or 
allocate (via ARTRY*) if that master has an outstanding read 
of an expansion bridge. 
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A3. The ARBus arbiter must prevent a snooping ARBus 
master from writing to an expansion bridge if the master has 
an outstanding read of an expansion bridge to allow for 
required Snoop Push write transactions. Dead Lock Avoid- 
ance Rules for Multiple Expansion Bridges, Split Bus, Fixed 
Priority, Ordered Masters and Slaves: 

Bl. An expansion bridge must disregard ARBus protocol 
and take the address tenure if a Bus Grant occurs concurrent 
with an ARtry*. 

B2. An expansion bridge must disregard ARBus protocol 
and keep its Bus Request asserted the clock following an 
ARTRY* if it is the source of the ARTRY*. 

B3. An expansion bridge must disallow a read of its slave 
while it has an outstanding master read transaction and its 
corresponding data tenure has not begun. 

B4. The ARBus arbiter must prevent one master from 
reading both expansion bridges. 

B5. ISA bus master cards must not read targets behind the 
opposite bridge. Software must restrict target accesses from 
ISA to the same bridge or main memory. 

B6. An expansion bridge must retry PCI Bus master reads 
if it has a PCI Bus master write transaction queued up in 
front of it that has not completed. 

B7. If an expansion bridge has an outstanding ARBus 
master's Slave Read, the ARBus arbiter must prevent 
(ARTRY*) that expansion bridge from reading another 
expansion bridge until the read completes. 

B8. If an expansion bridge has an outstanding ARBus 
master's Slave Read, the ARBus arbiter must prevent 
(ARTRY*) another expansion bridge from reading that 
expansion bridge until the read completes. 

As noted above, some of the deadlock avoidance rules are 
implemented in the expansion bridge itself. Others of the 
deadlock avoidance rules are implemented in the system 
arbiter. In either case, the general technique employed is to 
detect a deadlock hazard, a condition which, if a single 
further "deadlocking" transaction were accepted, would 
result in deadlock and, if that transaction is requested, 
refusing to accept it by issuing a retry signal. 

Referring again to FIG. 6, a block 613 monitors the state 
of the master queues 601 to detect a deadlock hazard, and 
monitors the BG and SACK vectors to detect a deadlocking 
transaction. When such a transaction is requested, an 
ARTRY signal is generated, causing the transaction to be 
backed off instead of being accepted and queued. 

Each master queue locally generates two signals, a 
ValidBrlRd signal, indicating that the master has a read to 
Expansion Bridge 1 pending, and a ValidBr2Rd signal, 
indicating that the master has a read to Expansion Bridge 2 
pending. These signals are bussed to the block 613 instead 
of actual queue entries. 

From the foregoing signals, the block 613 detects dead- 
lock hazards. Also input to the block 613 are the BG vector 
and the SACK vector, which together indicate the master/ 
slave pair for a requested transaction. From the latter signals, 
the block 613 detects deadlocking transactions and in 
response generates a ARTRY signal. 

Referring again to FIG. 7, in the case of the expansion 
bridge 700, deadlock avoidance is implemented in an 
ARBus control block 710 and in a PCIBus control block 
720. In particular, an Address Master slate machine 
[AMst601] causes the expansion bridge 700 to disregard the 
ARBus protocol and lake address tenure if a Bus GRant 
occurs concurrent with an ARTRY signal. Likewise, the 
Address Master state machine [AMst601] causes the expan- 
sion bridge 700 to disregard the ARBus protocol and keep its 
Bus Request asserted the clock following an ARTRY if it is 
the source of the ARTRY 



An Address Slave state machine and a PCI Master state 
machine each implement a further deadlock avoidance rule 
in similar manner as described previously in relation to the 
system arbiter. That is, a deadlock hazard is detected, during 

5 which if a deadlocking transaction is detected, that transac- 
tion is refused. In particular, the Address Slave state machine 
disallows a read of its slave while it has an outstanding 
master read transaction and its corresponding data tenure has 
not begun. The PCI Master state machine retries PCI Bus 

10 master reads if it has a PCI Bus master write transaction 
queued up in front of it that has not completed. 

Use of the described deadlock avoidance techniques 
enables a high-performance split-transaction system bus to 
be interfaced to a single-envelope expansion bus without 

15 compromising system reliability. Rather than the character- 
istics of the expansion bus limiting the performance of the 
system bus, performance of the system bus may be sepa- 
rately optimized. As a result, overall system performance is 
greatly improved. 

20 Increased Efficiency by Allowing Transaction Independence 
Within Slave Devices 

The description thus far has assumed a system in which 
both masters and slaves are ordered. In particular, the 60X 
microprocessor assumes that its transactions are ordered. As 

25 a consequence, master ordering is to some extent ingrained 
within the underlying system architecture. Slave ordering, 
on the other hand, although it may be convenient from an 
implementation perspective, is not required. Increased effi- 
ciency may be achieved by relaxing the constraint of slave 

30 ordering, thereby allowing transaction independence within 
slaves. To achieve unordered slaves, additional information 
must be exchanged between the slaves and the arbiter. As 
before, this information may be exchanged in the form of 
additional side-band signals not provided for by the 

35 MPC60X bus specification. 

Referring to FIG. 19, a block diagram is shown of a 
modified arbiter that allows for unordered slaves. The arbiter 
of FIG. 19 differs from the arbiter of FIG. 6 principally in the 
signals input to and output from the blocks ArbMux 603, 

40 ArbDatSM 604 and ARtryGen 613, as well as in the logical 
function of these blocks. In other respects, the arbiter of FIG. 
19 and the arbiter of FIG. 6 remain substantially the same. 
Like designations have therefore been used in FIG. 19 as in 
FIG. 6, with the ArbMux, ArbDatSM and ARtryGen blocks 

45 being differentiated by prime designations 603 ! , 604' and 
613', respectively. 

Considering first the block ArbMux 603', in order to allow 
for transaction independence within slaves, the ArbMux 603' 
receives as inputs all of the queue entries of all of the slave 

50 queues (instead of just all of the front entries as in FIG. 6). 
Therefore, if the masters are numbered 0 through M, the 
slaves are numbered 0 through S and the queues locations 
within each slave queue are numbered 0 through Q, then the 
ArbMux 603' receives (S+l)(M+l+lXQ+l) bits of informa- 

55 tion from the slave queues. One of the bits in the expression 
(M+l+1) is a valid bit that allows for a flop-based queue 
implementation instead of one requiring random-access 
memory. In an exemplary embodiment with S=5, M=4, and 
Q=2, the number of bits received from the slave queues is 

60 6x6x3=108 bits. Since masters remained ordered, the Arb- 
Mux 603' continues to receive only the front entries from the 
master queues, the same as in FIG. 6. In the illustrated 
embodiment, the ArbMux 603* receives from the master 
queues (M+lXS+l+l+l)=5x8=40 bits. As in the case of the 

65 slave queue entries, one of the bits in the expression (S+1 + 
1+1) is a valid bit. The extra bit in the expression (S+ 1+1+1) 
is a read/write bit as described previously. 
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Furthermore, the ArbMux 603', instead of receiving only to and including the last slave device, Slave S+l, whose 

a single RDDA signal from each slave, now receives an inputs include RDDA< S+1)0 , RDDA^^, . . . , RDDA( 5+J) 

RDDA signal for each slave queue entry. In the illustrated (<?+J)> 

embodiment, the ArbMux 603' therefore receives (S+1¥Q+ f , n i -r ■ 

l)-6x3=18 5j ts 5 ' a lransacllOD 1S allowed to proceed only it it is 

In the arbiter of FIG. 6, the ArbMux 603 outputs two bits the frommosl transaction of both the master and the slave. 

(SlvMaich and SIvRdReady) for each master in the system. Tne matching queue location within the slave is by defim- 

The ArbMux 603' of FIG. 19, on the other hand, outputs two tion always the frontmost valid queue location within the 

bits for each master for each queue location. Hence, the slave. In the case of ArbMux 603 of FIG. 6, therefore, its 

ArbMux 603' outputs 2(M+1)(Q+1) bits which are input to 10 function is to identify masters whose next transaction in 

the ArbDatSM 604'. In the illustrated embodiment, the order is also the next transaction in order of the target slave 

ArbMux 603' outputs 2x5x3=30 bits for input to the Arb- device. In the case of the ArbMux 603' of FIG. 19, slave 

DatSM 604'. The front queue entries from each of the master ordering is no longer required. Hence, the function of the 

queues are input to the ArbDatSM 604' as before. ArbMux 603' is to identify for each master the queue 

The ArbDatSM 604 of FIG. 6 produces two sets of output 15 location within the target slave that matches the frontmost 

signals, DBG and SSD. The DBG output signals remain transaction of the master. The ArbMux 603' also indicates 

unchanged in the case of the ArbDatSM 604'. One DBG whether transacti on data for that queue location is ready, 

signal is output for each master for a total of M+l DBG HencC) for each masler> tWQ ^ a slvMatch bit and a 

signals Instead of outputting out a single SSD signal for slvRdRead bit , are out t for each bcation In the 

each slave device, however, the ArbDatSM 604 outputs an 20 c ^ q{ m ^ b{{ * 603 , 

SSD signal for each queue location within each slave device, . . . . * . . JT \. ' .... 

for a total of (S+1XQ+1) bits (6x3=18 bits in the illustrated " e desi S na,ed ^oOo. M 0 Q„ .... M 0 Q (p+ ? and likewise 

embodiment) e succeeding master up to and including the last 

The ArbDatSM 604' receives multiple address coinci- master ^W)' the out P uts for which are M (M + i)Q 0> 

dence (AC) signals from each of the slave devices. In the 25 M ("+iA> ■ • • » M (Af+1) Q (<?+1) . If a master has a valid 

illustrated embodiment the ArbDatSM 604' receives from transaction in its queue, then for the frontmost valid 

each slave device a separate signal for every possible pair of transaction, the SlvMatch signal for that master that corre- 

queue entries within the slave device, indicating whether the sponds to the matching target slave queue location will be 

same cache line is the target of both transactions queued asserted. If the master has no valid transaction in its queue, 

within the pair of queue entries. In general there are Q(Q+ 30 then no signal is asserted for that master. 

l)/2 possible pairs of queue entries within a slave device. The inputs and outputs of ArbMux 603' are illustrated in 

The ArbDatSM 604' therefore receives (S+l)[Q(Q+l)/2] greater delail in Fia 2 1 for the case M-4, S=5 and Q=2. 

total address coincidence bits or, in the illustrated D e t - 4 A „ . r 

embodiment,6x2x3/2=18bits.Tl 1 eARtr y Genblock613',in ™™Z\*l HG - ** h * ° T 

addition to the BG and SACK vector inputs previously 35 ^atSM , 604 ™ « dctai The oiUpuls 

described in relation to the ARtryGen block 613 of FIG. 6, of the ArbMux 603 descri ^ ed previously are shown as being 

also receives the same address coincidence signals. m P ul to the ArbDatSM 604' at a top edge thereof. TTiese 

In the case of some slave devices, the average latency of m V nis m used b y lhe ArbDatSM 604' to determine which 

the slave device may be reduced by reordering transactions master 1S to 06 S ramed the bus by asserting one of the Data 

involving the slave device. In the case of DRAM, for 40 Bus Granl signals DBG 0 through DBG M+1 output by the 

example, page mode reads take less time than non-paged ArbDatSM 604'. The same inputs are also used by the 

reads. Hence, in the embodiment of FIG. 19, the ArbDatSM ArbDatSM 604' to determine which SSD signal of the target 

604' further receives page coincidence (PQ signals from at slave is to be asserted according to the queue location that 

least one slave device, i.e., DRAM. The ArbDatSM block the transaction occupies within the slave queue. Which slave 

604' receives from the slave device a separate signal for 45 is in fact the target slave is identified by the frontmost master 

every possible pair of queue entries within the slave device, queue entries, shown as being input to the ArbDatSM 604' 

indicating whether the targets of both transactions queued at a left edge thereof in like manner as in FIG. 6. 

within the pair of queue ^entries are within the same page ^ ^j^m 604' outputs an SSD signal corresponding 

TTte ArbDatSM block 604' therefore receives Q(Q+W total t0 each slave loc ^ Q Hence> for | lave 0 t * * 

SSflSb? 1 ™ ^ m embodiment, 50 of ^ 604 , inchlde SSDqo> ? ^ 

Referring now to FIG. 20, the inputs and outputs of the ™? 50 forth for e / ch sla u ve "P \° ^j**^?* SIave 

ArbMux block 603' are illustrated in greater detail. For each S +J/ the oul P uts for which ,nclude SSD (^do. 

master M 0 through M {M + 1)y the ArbMux 603' receives the SSD (s+i)i» • • • » SStys+ixe+i)- 

frontmost queue entry, represented as Q 0 . The inputs from 55 The inputs to the bottom edge of the ArbDatSM 604' are 

the master queues to the ArbMux 603' are therefore repre- used by the ArbDatSM 604' to ensure that data dependencies 

sented as MqQq, MjQ 0 , . . . , M (A/+1) Q 0 . are observed and to realize a further optimization as 

In the case of the slave queues, every slave queue entry is described more fully hereinafter, 

input into the ArbMux 603'. Hence, for the slave queue S 0 , ln its basic 0 p er ation, the ArbDatSM 604' performs the 

inputs to the ArbMux 603' include S 0 Q 0 , S 0 Q 19 . . . , 60 following functions: 
SqQ^j), and likewise for each slave queue in sequence up 

to and including the last slave queue S (M) , whose inputs Determines the highest priority master having a trans- 
include S (5+1) Q 0 , S (S+1) Q lf . . . , S (5+1) Q (<?+1) . The ArbMux action "ready to go" based on: 
603' receives from the slave devices themselves individual a) the SlvMatch bits for all of the masters; 
Read Ready signals for each queue location. From Slave 0, 65 b) the read/write bits from the frontmost queue loca- 
therefore, the ArbMux 603' receives RDDA 00 , tions of all of the master queues; and 
RDDAq,, . . . , RDDAq^j), and likewise for each slave up c) the SIvRdReady bits for all of the masters. 
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2. Asserts the corresponding DBG signal for the winning actions such that they must be executed in order. To prevent 
master; and the transaction in queue entry 1 from being executed before 

3. Asserts the correct SSD signal for the target slave based the transaction in queue entry 0, the SlvMatch bits of Master 
on: 0 are modified, e.g., changed from 010 to 000. The same 

a) the SlvMatch bits for the winning master; and 5 modification is performed for each arbitration cycle until the 

b) the SACK vector in the frontmost queue location of transaction in queue entry 0 has executed. The address 
the winning master. coincidence bits for Slave 0 will then be 000. The SlvMatch 

As may be appreciated from the foregoing description, the bits of Masler 0 then> iostead of being modified, remain 010 

system of FIG. 19 is much more loosely coupled than the such that lhe transaction in queue entry 1 may be executed 

system of FIG. 6. The loosely-coupled nature of the system 10 next ^ Master 0 ^ the wirming master . 

of FIG. 19 may be taken advantage of to improve the way Xhe ^CatSM 604' uses the page coincidence (PC) bits 

in which deadlocks are avoided. in a similar manner> not to enforce data dependencies but to 

As previously described in relation to FIG. 6, slave reduce slave latency and boost system performance. In the 

ordering is a major cause of deadlock. When what would iii ustr ated embodiment, PC bits are received from DRAM 

otherwise be a deadlocking transaction is detected, it is 15 only. In other embodiments, PC bits may be received from 

"killed" by issuing an ARtry signal. Without slave ordering, otner or additional slave devices . ^ slave device ^ 

a large proportion of what would otherwise be deadlocking responsible, once a PC bit has been asserted, to keep that PC 

transactions, instead of being killed, can now be accepted bit asserted until bolh of the page . coincident transactions 

and reordered in relation to other transactions so as to avoid have been executed (or, more precisely, scheduled for 

deadlock. Such reordering is not possible, however, when a 20 execu tj on ) 

data dependency exists. For example, a read of one data In operation, the ArbDatSM 604* determines to which 

location by one device followed by a write of the same data masters the pc bits will be applied> which maslers have 

location by another device does not yield the same result as a DRAM transaction at the front of their queues, in accor- 

if the execution order is reversed. If a deadlock situation dance with the SACK vectors at the head of the master 

cannot be avoided by transaction reordering because of a ^ queues The rc bits afe lbeQ used (0 determine which queue 

data dependency, the need remains to kill the deadlocking i ocati ons cannot have the transactions queued therein go 

transaction. next w i tnout forfeiting the speed advantage to be gained 

Of course, data dependencies may also exist absent any from paged access . In practice, if a PC bit is asserted, the 

potential deadlock situation. Observing such data dependen- transactions to which the PC bit relates will be scheduled for 

cies will not cause any transaction to be killed as in a 30 execution prior to any other transactions involving the 

deadlock situation, although it may reduce somewhat the DRAM. In other words, if the DRAM has three transactions 

utilization of the bus. queued> ^ Q of which are t0 the same page> the execution 

Information regarding data dependencies is input to the order wil l be COINCIDENT, COINCIDENT, NON- 

ArbDatSM 604' in the form of address coincidence (AC) COINCIDENT, instead of NON-COINCIDENT, 

signals from each of the slaves. Using this information, the 35 COINCIDENT, COINCIDENT, although both sequences 

ArbDatSM 604' schedules transactions so as to observe all yicld lhc ^ ^ advantage . l n other embodiments, any 

data dependencies. For each of slave devices 0 to S+l, the execution order that results in the page^oincident transac- 

ArbDatSM 604' receives Q(Q+l)/2 address coincidence bits. tions bemg executed one after another without any inter- 

In the case of Q=2, for example, the ArbDatSM 604' receives vening traDsaction may be acceptable for purposes of the PC 

three address coincidence bits from each slave: AC 01 , AQ^, 40 

and AC 12 , each indicating that the two subscripted queue The AC ^ pc bits may be rega rded as control inputs to 

locations have target addresses within the same cache line. a 5it fiher thal ope r a t e s upon the SlvMatch bits, as shown in 

In operation, the ArbDatSM 604' uses the address coin- p|Q 23 

cidence signals as follows: ^ inputs and oulputs of a^d^sm 604' are illustrated 

1. The ArbDatSM selects for each master a set of address 45 m greater detail in FIG. 24 for the case M-4, S=5 and Q=2. 
coincidence bits from a particular slave in accordance Referring to FIG. 25, the inputs and outputs of the 
with the SACK vectors at the head of the respective ARtryGen block 613' are illustrated in greater detail. The 
master queues. inputs along the top and left edges of the ARtryGen block 

2. Each selected set of address coincidence bits is used to 613' remain unchanged compared to the ARtryGen block 
determine for that particular slave device which queue 50 613 of FIG. 6. Unlike the ARtryGen block 613 of FIG. 6, 
location or locations cannot have the transaction however, the ARtryGen block 613', instead of generating 
queued therein go next without violating a data depen- ARtry based on the assumption of ordered slaves, uses 
dency. certain deadlock address-coincidence (DLAC) inputs 

3. For each master, the SlvMatch bits input to the Arb- received at the bottom edge of the block to generate a 
DatSM are modified in accordance with the results of 55 "qualified" ARtry signal only when a data dependency 
Step 2 to turn off selected SlvMatch bits, if necessary, prevents transactions from being reordered so as to avoid the 
in order to ensure that data dependencies are observed. deadlock. The slave devices each monitor each system bus 

To take a concrete example, assume that the frontmost address tenure and compare the address placed on the bus to 

queue entry for Master 0 designates Slave 0. Assume further addresses queued within the respective slave devices. If the 

that the SlvMatch bits for Master 0 are 010, indicating that 60 address on the bus is the same as an address already queued 

the match is for queue entry 1 of Slave 0. Without taking into within the slave device, the slave device raises its DLAC 

account the address coincidence bits of Slave 0, the trans- signal to the ARtryGen block 613'. All slave devices or only 

action in queue entry 1 will be executed if Master 0 is the selected slave devices (most importantly DRAM) may 

winning master. Now assume that the address coincidence monitor the bus and signal the ARtryGen block 613' in this 

bits of Slave 0 are 100, indicating that the transactions 65 manner. In the illustrated embodiment, all slave devices are 

within queue locations 0 and 1 are directed to the same cache assumed to provide a DLAC signal. The ARtryGen block 

line. A data dependency therefore exists between the trans- 613' therefore receives signals DLAQ, through DLAC ( ^, V 




